dirichlet allocation
- Asia > Middle East > Jordan (0.05)
- North America > Canada (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > Middle East > Jordan (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity Than MAP
Shinichi Nakajima, Issei None Sato, Masashi Sugiyama, Kazuho Watanabe, Hiroko Kobayashi
Latent Dirichlet allocation (LDA) is a popular generative model of various objects such as texts and images, where an object is expressed as a mixture of latent topics. In this paper, we theoretically investigate variational Bayesian (VB) learning in LDA. More specifically, we analytically derive the leading term of the VB free energy under an asymptotic setup, and show that there exist transition thresholds in Dirichlet hyperparameters around which the sparsity-inducing behavior drastically changes. Then we further theoretically reveal the notable phenomenon that VB tends to induce weaker sparsity than MAP in the LDA model, which is opposed to other models. We experimentally demonstrate the practical validity of our asymptotic theory on real-world Last.FM music data.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Spectral Methods for Supervised Topic Models
Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on either variational approximation or Monte Carlo sampling. This paper presents a novel spectral decomposition algorithm to recover the parameters of supervised latent Dirichlet allocation (sLDA) models. The Spectral-sLDA algorithm is provably correct and computationally efficient. We prove a sample complexity bound and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on a diverse range of synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the algorithm.
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Topic Modeling in Marathi
Shinde, Sanket, Joshi, Raviraj
While topic modeling in English has become a prevalent and well-explored area, venturing into topic modeling for Indic languages remains relatively rare. The limited availability of resources, diverse linguistic structures, and unique challenges posed by Indic languages contribute to the scarcity of research and applications in this domain. Despite the growing interest in natural language processing and machine learning, there exists a noticeable gap in the comprehensive exploration of topic modeling methodologies tailored specifically for languages such as Hindi, Marathi, Tamil, and others. In this paper, we examine several topic modeling approaches applied to the Marathi language. Specifically, we compare various BERT and non-BERT approaches, including multilingual and monolingual BERT models, using topic coherence and topic diversity as evaluation metrics. Our analysis provides insights into the performance of these approaches for Marathi language topic modeling. The key finding of the paper is that BERTopic, when combined with BERT models trained on Indic languages, outperforms LDA in terms of topic modeling performance.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
- Asia > India > Maharashtra (0.04)
- Africa (0.04)
Reliability of Topic Modeling
Schroeder, Kayla, Wood-Doughty, Zach
Topic models allow researchers to extract latent factors from text data and use those variables in downstream statistical analyses. However, these methodologies can vary significantly due to initialization differences, randomness in sampling procedures, or noisy data. Reliability of these methods is of particular concern as many researchers treat learned topic models as ground truth for subsequent analyses. In this work, we show that the standard practice for quantifying topic model reliability fails to capture essential aspects of the variation in two widely-used topic models. Drawing from a extensive literature on measurement theory, we provide empirical and theoretical analyses of three other metrics for evaluating the reliability of topic models. On synthetic and real-world data, we show that McDonald's $\omega$ provides the best encapsulation of reliability. This metric provides an essential tool for validation of topic model methodologies that should be a standard component of any topic model-based research.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
Priors for Diversity in Generative Latent Variable Models
Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Asia > Middle East > Jordan (0.04)
Analysis of Variational Bayesian Latent Dirichlet Allocation: Weaker Sparsity than MAP
Latent Dirichlet allocation (LDA) is a popular generative model of various objects such as texts and images, where an object is expressed as a mixture of latent topics. In this paper, we theoretically investigate variational Bayesian (VB) learning in LDA. More specifically, we analytically derive the leading term of the VB free energy under an asymptotic setup, and show that there exist transition thresholds in Dirichlet hyperparameters around which the sparsity-inducing behavior drastically changes. Then we further theoretically reveal the notable phenomenon that VB tends to induce weaker sparsity than MAP in the LDA model, which is opposed to other models. We experimentally demonstrate the practical validity of our asymptotic theory on real-world Last.FM music data.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.72)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)